Citizen science data reveal regional heterogeneity in phenological response to climate in the large milkweed bug, Oncopeltus fasciatus

Abstract Regional populations of geographically widespread species may respond to different environmental factors across the species' range, generating divergent effects of climate change on life‐history phenology. Using thousands of citizen science observations extracted from iNaturalist and associated with corresponding temperature, precipitation, elevation, and daylength information, we examined the drivers of adult mating and of nymphal phenology, development, and group size for populations of the large milkweed bug, Oncopeltus fasciatus, in different ecoregions. Research‐grade iNaturalist images were correctly identified 98.3% of the time and yielded more than 3000 observations of nymphal groups and 1000 observations of mating adults spanning 18 years. Mating phenology showed distinct regional patterns, ranging from year‐round mating in California to temporally restricted mating in the Great Lakes Northeastern Coast ecoregion. Relative temperature increases of 1°C for a given daylength expanded the mating season by more than a week in western ecoregions. While increases in relative temperature delayed mating phenology in all ecoregions, greater winter precipitation advanced mating in the California ecoregion. In the eastern ecoregions, nymphal phenology was delayed by increases in summer rainfall but was advanced by relative temperature increases, whereas in western regions, relative temperature increases delayed nymphal phenology. Furthermore, accumulated growing degree days (AGDD) was a poor predictor of developmental progression, as we found a positive but weak correlation between AGDD and age structure only for the Appalachian Southeast North America and the Great Lakes Northern Coast ecoregions. These complex phenological responses of O. fasciatus are just one example of how populations may be differentially susceptible to a diversity of climatic effects; using data across a species' whole distribution is critical for exposing regional variations, especially for species with large, continental‐scale ranges. This study demonstrates the potential of photodocumented biodiversity data to aid in the monitoring of life history, host plant–insect interactions, and climate responsiveness.

the large milkweed bug, Oncopeltus fasciatus, in different ecoregions. Research-grade iNaturalist images were correctly identified 98.3% of the time and yielded more than 3000 observations of nymphal groups and 1000 observations of mating adults spanning 18 years. Mating phenology showed distinct regional patterns, ranging from year-round mating in California to temporally restricted mating in the Great Lakes Northeastern Coast ecoregion. Relative temperature increases of 1°C for a given daylength expanded the mating season by more than a week in western ecoregions.
While increases in relative temperature delayed mating phenology in all ecoregions, greater winter precipitation advanced mating in the California ecoregion. In the eastern ecoregions, nymphal phenology was delayed by increases in summer rainfall but was advanced by relative temperature increases, whereas in western regions, relative temperature increases delayed nymphal phenology. Furthermore, accumulated growing degree days (AGDD) was a poor predictor of developmental progression, as we found a positive but weak correlation between AGDD and age structure only for the Appalachian Southeast North America and the Great Lakes Northern Coast ecoregions. These complex phenological responses of O. fasciatus are just one example of how populations may be differentially susceptible to a diversity of climatic effects; using data across a species' whole distribution is critical for exposing regional variations, especially for species with large, continental-scale ranges. This study demonstrates the potential of photodocumented biodiversity data to aid in the monitoring of life history, host plant-insect interactions, and climate responsiveness.

| INTRODUC TI ON
Phenological alterations in life-history events are one of the most ubiquitous responses to global climate change. Insect responses include earlier hatching (Kiritani, 2013), better host synchrony (van Asch et al., 2013), and increased voltinism (Altermatt, 2010;Tobin et al., 2008). Resulting changes in insect communities may dramatically alter systems that rely on coordinated timing of reticulate species interaction networks (Beard et al., 2019;Chuine & Régnière, 2017).
Unfortunately, broad investigations into climate-driven changes in insect phenology have not kept pace with the explosive popularity of such studies in plants and vertebrates (Eckert et al., 2010;Lane et al., 2011;Renner & Zohner, 2018;Valenzuela et al., 2019;Visser et al., 2004). Specifically, studies have mainly been limited to adult life-history stages (Roy & Sparks, 2000), to charismatic taxa like butterflies (Kharouba et al., 2014), or to a few localities or well-studied regions (Park et al., 2021). These shortfalls prevent the analysis of developmental responsiveness to climate change for most insect taxa and preclude large-scale predictions for a variety of important factors, such as plant damage, insect outbreaks, or species losses.
Increasing amounts of digital data are readily available to address large-scale phenological questions for insects. For instance, digitized museum specimens have contributed millions of historical occurrence records (Nelson & Ellis, 2019) used to assess population change or revise range distributions (Halsch et al., 2021;Lemoine, 2015;Wilson et al., 2021), to estimate species phenology (Willis et al., 2017), and to document changes to species interactions (Garretson & Forkner, 2021;Meineke & Davies, 2019). In addition, image-based citizen science programs are a growing source of data documenting changes in the timing of plant and animal phenophases at a variety of spatial scales (Gerst et al., 2017(Gerst et al., , 2020Morisette et al., 2009;Wolkovich & Cleland, 2011). These digitized data also provide developmental information that can be used to investigate species interaction changes and species responses to climate variation on large geographic scales (Belitz et al., 2021;Groom et al., 2021).
Predicting insect phenological change remains complicated despite new data sources. Individual species may respond to minimum, maximum, or variance (extremes) in the daily or seasonal day or night temperatures or precipitation (Bale et al., 2002;Halsch et al., 2021;Kiritani, 2013). Insects' responses to abiotic variables may also interact with static factors, including daylength and topography, to produce complex outcomes (Spence & Tingley, 2020;Yee et al., 2017).
Insect phenological responsiveness to climate may also vary within species due to local adaptation (Melero et al., 2022). Phenological differences because of local adaptation may be particularly variable for insects with large historic distributions. Populations across wide geographic ranges may vary in dormancy, migration tendencies, timing or arrival of life stages, and host plant preferences, directly or indirectly generating high heterogeneity in responsiveness to climate.
A large geographic distribution may also buffer the impacts of climate change for these species (Bale et al., 2002), or climate variables may have greater predictive capacity across latitude when species span a greater number of ecoregions (Coroian et al., 2014).
The milkweed-arthropod community provides a model system for investigating large-scale patterns in the developmental responsiveness of insects to climate change. The milkweed community is highly studied because of the threatened status of the Asclepias-dependent monarch butterfly (Danaus plexippus) and the presence of multiple mimicry complexes (Brower, 1958;Duffey & Scudder, 1972). Danaus plexippus exhibits one of the most spectacular animal migrations (Brower, 1977;Urquhart, 1976), and its declines have sparked many citizen science programs (Ries & Oberhauser, 2015;Schultz et al., 2017). Since the 1940s, approximately 17% of publications on monarchs have relied on citizen science data (Ries & Oberhauser, 2015). Less attention has been paid to other milkweed-associated arthropods, despite the importance of diverse and healthy milkweed communities to monarch survival (Stevenson et al., 2021). The large milkweed bug, Oncopeltus fasciatus Dallas (Hemiptera: Lygaeidae), is a gregarious Asclepias seed feeder found from Southern Canada to Central America. Hemipterans are understudied relative to their species richness (Andrew et al., 2013), but early work on Oncopeltus focused on the effects of temperature on life-history variation (Baldwin & Dingle, 1986;Dingle et al., 1980;Miller & Dingle, 1982 as adults regardless of daylength, leading to differences in age at first reproduction from as few as 10 days to as much as 102 days (Dingle, 1974). Furthermore, nondiapausing females in these experiments had lower clutch sizes (Dingle, 1974). Temperatures that speed degree-day-driven nymphal development and simultaneously expose fifth instars to warmer conditions may override diapause induction, reduce clutch sizes, and interfere with migration.
The variation in clutch size in O. fasciatus between diapausing and nondiapausing females as well as with temperature (Baldwin & Dingle, 1986) suggests that interactions between daylength and climate change may not only affect diapause and, thus, migration but may also affect nymphal survival, as nymphs experience higher survival in larger feeding aggregations. Furthermore, timing egg hatch and nymphal development to the availability of pods and seeds in this species is critical as nymphs fed vegetative plant parts have stunted growth and adults fail to reproduce (Ralph, 1976). Such life history traits may combine with a wide geographic range to produce complex responses to climate across populations in ecoregions that vary substantially in abiotic factors.
To investigate heterogeneity in climatic responsiveness of adult and nymphal phenology for this wide-ranging insect, we developed

| Annotation of Oncopeltus fasciatus life history
We queried the iNaturalist dataset (Nugent, 2018)  Research-grade observations are those for which at least two-thirds of community-provided identifications agree on the species-level identification. We excluded images with obscured or private coordinates, as well as observations with positional accuracy >100 km. The remaining occurrences (n = 12,240) included a geo-referenced location of occurrence, an observation date, and an image of the insect(s).
Next, we excluded images depicting pinned specimens or a specimen in a container or on a hand, images too blurry to count individuals, images that appeared digitally manipulated (e.g., a screenshot from another device), and images in which individuals could not be confirmed to be O. fasciatus. We excluded images for which O. fasciatus was not the correct identification and updated those observations on iNaturalist accordingly (n = 187, see Section 3). In total, this yielded 11,224 records with full spatial and climate data spanning from June 2003 to May 2020. Finally, we used the Similar Species endpoint in the iNaturalist API (https://api.inatu ralist.org/v1/docs/) to extract counts of incorrect species-level identifications on observations later confirmed to be O. fasciatus. We used these counts to create a Sankey diagram illustrating the process of correct identification from initial submission to research-grade confirmation as O. fasciatus.
For each of the resulting records, we annotated the first observation image for the number of adults and subadults, the presence of mating adults, and the part of the plant occupied by the insects.
For the part of the plant, we assigned images to one of the following based on the location of insects in the image: flowers, open pods, closed pods, stem or branch, leaves, or not on plants (most frequently on the ground or non-natural structures). We did not identify all host plants because photographs often lacked diagnostic characteristics. For each observation, we extracted the elevation from the Amazon Web Services terrain tiles (Larrick et al., 2020) using the elevatr package in R (version 0.3.1, Hollister et al., 2020). We also extracted the minimum and maximum temperatures and precipitation on the observation date and during the preceding winter from the Daymet gridded climate data using the daymetr package in R (version 1.4 Hufkens et al., 2018;Thornton et al., 2020). We further used the Daymet data to estimate accumulated growing degree days (AGDD) starting from January 1 of the observation year with a lower threshold of 14°C and a maximum threshold of 38°C (Feir, 1974;Lin et al., 1954). Lin et al. (1954)

| Assignment to climate regions
We categorized observations into ecoregions by location ( Figure 1

| Analysis of nymphal group size and plant part occupancy
We investigated whether the nymphal group size varied by the part of the plant that the group occupied (see Annotation methods described above). To test whether the trends observed in the models of plant parts were explained by underlying climate variability, we then used analysis of covariance (ANCOVA) to determine whether the average number of subadults in a nymphal group varied significantly across the fixed factors plant part, ecoregion, and their interaction, using elevation, latitude, annual mean temperature, and annual total precipitation as covariates. In fixed factors that were significant in the ANCOVA, we used a post hoc Tukey HSD test to compare the means between groups.

| Analysis of mating and nymphal group phenology
To assess how the yearly patterns of mating or the presence of nymphs varied across the O. fasciatus range, we fit a Generalized Additive Model (GAM) with day of year as a smoothed predictor to the probability that an iNaturalist image depicted (1) mating insects or (2) nymphal groups (3+ subadults) across each ordinal day of year (DOY). Unlike our model of phenology described below, in which we know a photodocumented occurrence represents a specific date on which that phenophase occurred at that location, our "absences" of this phenology are not "true absences," and are likely to be biased by the collector behavior (Boakes et al., 2010;Courter et al., 2013). Due to these limitations, we did not generate more complex models of the probability of mating and instead comment on the general trends suggested by the patterns of mating observation. To determine the sensitivity of mating and nymphal phenology to climate variables, we then limited analyses to only observations that depicted mating or nymphal groups, excluding images of individual O. fasciatus. Because DOY both dictates daylength and is highly positively correlated with temperature (i.e., in the northern hemisphere, longer days in the summer have higher temperatures and shorter days in the winter have lower temperatures), and because we wanted to account for the influence of temperature on mating season and nymphal development separately from solar insolation, we first regressed maximum daily temperature on daylength using a General Additive Model and daylength as a smoothed predictor for each observation in the full dataset and computed the residuals (Figures S1 and S2).
Positive residual values indicate that temperatures were warmer than average for that DOY/daylength and negative values indicate that temperatures were cooler than average for that DOY. Average daylength ranged from 9.1 to 15.6 for observations in our dataset.
With primarily presence-only data from opportunistically collected data, fitting models to all available phenological presence dates to produce an estimate of mean date is more robust to changes in sampling size over time and space than methods to model first or last dates of phenophase occurrence (Jones & Daehler, 2018). Therefore, to model mean mating date, we fit GAMs for the full dataset and for each ecoregion to the mating occurrence dates as a function of the residual temperature, winter precipitation (mm), and winter maximum temperature (°C), the elevation of the observation, and a smoothed two-dimensional function of the latitude and longitude (Fang & Feng, 2013;Yee & Mitchell, 1991). We selected winter climate as the relevant season for predicting mating behavior because O. fasciatus exhibits facultative reproductive diapause, which is particularly sensitive to declining temperatures and photoperiod (Dingle, 1974;Tauber et al., 1986). To model mean nymphal group dates, we fit GAMs for the full model and for each ecoregion to the occurrence dates as a function of residual temperature, summer precipitation, and summer maximum temperature, elevation of the observation, and a smoothed two-dimensional function of the latitude and longitude. Contrary to the models of mating adults, we instead selected summer climate as a predictor of nymphal phenology because previous work in this species has demonstrated a close link between ambient temperatures and nymphal aggregations (Barrett & Chiang, 1967;Bongers, 1969). Modeling the latitude and longitude of an observation simultaneously allows models to better account for spatial autocorrelation than linear models and reveal complex patterns of spatial variation in response variables within each ecoregion (Segurado et al., 2006). We fit and graphed models using the mgcv R (version 0.1.6, Wood, 2004Wood, , 2011Wood, , 2017 and mgcViz (version 1.8-36, Fasiolo et al., 2020) packages.

| Analysis of developmental progression
For each nymphal group (3+ subadults), we calculated the proportion of adults in the total number of individuals depicted as a proxy of the developmental progression of the nymphal group. We selected this metric as a proxy for developmental progression because we could not fully assess the developmental stage of subadults and assign them to nymphal instar(L1-5) developmental stages based on photographs. Further, this metric may be influenced by adult behavior patterns and we are not able to capture initial cluster conditions using these opportunistically collected data. Because of these limitations, we opted for a relatively simple analysis to determine whether we could detect any relationship between the accumulated growing degree days or day of year and elevation for these observations. We calculated the Pearson's R correlation coefficient between AGDD or DOY on the observation date and proportion of adults for each ecoregion. For regions that had significant correlation coefficients, we split observation by elevation (<50 m, 50-200 m, 200+ m) and calculated the correlations between AGDD or DOY and proportions of adults in each elevation category. However, in the supplemental materials, we also provide results for a more complex general linear model with a quasibinomial family and the day of year (or AGDD), elevation, and the interaction term (Table S1). We found that the model fit was quite low, as we expected from the aforementioned sources of additional variation in these data, so include only correlation outputs in the main text.

| Identification
Images depicted an average of 6.63 ± 12.05 (mean ± SD) O. fasciatus individuals. Observation photos depicting a single adult were the most common observation type, accounting for 50.58% of our data.
For the 11,224 observations, iNaturalist contributors made 22,279 total attempts at species-level identifications, spanning 24 total taxa.
Contributors' initial identifications were incorrect for 1356 (6.1%) of these attempts, of which 53.8% (730) were misidentified as Lygaeus, 26.3% (356) misidentified as other Oncopeltus species, and 6.8% (92) misidentified Tetraopes (Figure 1). Despite incorrect first identifications, within the research-grade dataset only 1.7% (187) of the observations did not depict O. fasciatus and required correction. As with incorrect contributor initial identifications, the majority of the incorrect research-grade images were corrected to Lygaeus species.
Occasionally, two or one subadults were photographed alone (4.26%), but these did not meet our criteria for nymphal groups and were excluded from the analysis. The average nymphal group size (the number of subadults within a group) was 18.29 ± 16.21 (mean ± SD).

| Mating phenology
Mating adults occurred in 1006 (9%) of observations (Figure 2). In 6.94% of observations, multiple adults were observed, but they were not mating and were not included in this analysis. Mating APSE ecoregion showed a gradual increase in mating adults, with peak mating in July and subsiding by October (Figure 3). Populations in the GLNC ecoregion showed a single, sharp peak in mating behavior in early August, and a steep decline in mating probability, reaching zero probability by mid-September (Figure 3).
Generalized Additive Models showed that increases in the amount of precipitation and increases in winter maximum temperatures for the winter prior to collection were significantly associated with advances in the timing of mating for populations in the CALC ecoregions (Table 1). Residual temperature on the collection date was a significant predictor of mating phenology for all ecoregions in our study, but the direction of this effect varied (Table 1). In the majority of ecoregions and in the full model, higher temperatures than average for the given daylength were associated with mating adults occurring at later dates in the season. SCGP, CALC, and APSE populations delayed mating by 9.9, 5.3, and 2.5 days/°C increase in residual temperature, respectively. By contrast, higher residual temperatures on a given day resulted in a 1.5 days/°C advancement of mating times in GLNC. Increases in elevation were associated with later mating only for the SCGP ecoregion.
Latitude and longitude were significant predictors of the timing of mating for the full model, which indicated that populations in the GLNC mated nearly 100 to 150 days later than populations in the extreme southern portions of the range (southwestern California, south central Texas, and southern Florida, Figure 4). However, when ecoregions were modeled separately, only the SCGP and APSE ecoregions showed significant spatial variation in mating phenology ( Figure 4). In the SCGP regions, populations in southwestern California mated more than 150 days earlier than those in the central midwestern states. In the APSE region, populations in southern TA B L E 1 Summary of parameter estimates for generalized additive models (GAM) for mating adult (top) and nymphal (bottom) phenology, including number of observations (N), percent deviance explained for model (%Dev Exp), and coefficients (β ± SE) for residual temperature (see Section 2), elevation, total winter precipitation, and maximum winter temperature.  (long) were included in the model as smoothed two-dimensional terms to account for spatial variation (see Figure S1), and estimated degrees of freedom (edf) are provided. * .05 < p > .01, ** 01 < p > .001, *** p < .001.

Mean DOY ± SD
Florida mated a month or more prior to those in the Appalachian and Blue Ridge Mountain areas.
Generalized Additive Models showed that nymphal phenology was associated with the summer precipitation for the eastern ecoregions (APSE and GLNC, Patterns of spatial variation in nymphal phenology were similar to those of adults ( Figure 5). Latitude and longitude were not significant in models of nymphal phenology in the CALC or SCGP ecoregions (Table 1, Figure 5). Increases in elevation were associated with the later appearance of nymphs for both the GLNC and CALC ecoregions.

| Developmental progression
Age structure (the proportion of adults within a group) was positively but weakly correlated with AGDD for the APSE (Pearson's correlation coefficient: R = .1, p = .0002), as well as the GLNC ecoregion (R = .12, p < .0001) (Figure 6). The proportion of adults in observations from mid (50-200 m) elevations in APSE and GLNC and high F I G U R E 3 The probability that an observation of Oncopeltus fasciatus depicted a mating adult (solid green) or a nymphal group (dashed blue) across the ordinal day of year for the four ecoregions (as in Figure 1). Each point is an observation assigned a 1 (mating adults or nymphal group present) or 0 (mating adults or nymphal group not depicted).
Curves are GAM best-fit curves showing the probability of occurrence across the year for each life stage with gray 95% confidence intervals.

F I G U R E 4
Spatial representation of mean mating occurrence overall (top) and by ecoregion (bottom). Two-dimensional smoothed curves of latitude and longitude fit simultaneously in GAM. Lighter (yellow) hues indicate delays (later) in the presence of mating or nymphs relative to the mean for that ecoregion, Darker (blue) hues indicate advances (earlier) in mating and nymphs relative to the mean.  Figure 6). The results for DOY and elevation are presented in Figure S3 and largely followed the results with AGDD.
The variability of the developmental progression around this correlation is highly significant, so the explanatory power of these models is quite low, so future work using finer-grained measures of developmental progression may provide more insight into future studies.

| DISCUSS ION
Our study unveiled complexity in the phenological responses of Oncopeltus fasciatus life-history stages to climate variables across the continental range of this insect. Using image-based citizen F I G U R E 5 Spatial representation of mean nymphal group occurrence overall (top) and by ecoregion (bottom). Two-dimensional smoothed curves of latitude and longitude fit simultaneously in GAM. Lighter (yellow) hues indicate delays (later) in the presence of mating or nymphs relative to the mean for that ecoregion, darker (blue) hues indicate advances (earlier) in mating and nymphs relative to the mean.

F I G U R E 6
The proportion of Oncopeltus fasciatus adults in nymphal groups as a function of the day of the year. Gray area is the standard error. Developmental progressions were further subdivided by elevation (bottom) and Pearson's correlation coefficients (R) provided for each elevation category. *** indicated with P < .0001.
science data, we found that the drivers of mating and nymphal phenology differed across ecoregions, with nymphs and adults in eastern and western regions responding to different abiotic variables or in different directions to the same variable. The iNaturalist data were overwhelmingly a reliable source for correctly identified, scientifically beneficial observations for this species. Together, our findings underscore the utility of photodocumented citizen science observations to monitor species interactions, life history, and climate impacts for wide-ranging species.

| Mating
Our assessment across the entire O. fasciatus range found that after controlling for the correlation between maximum temperature and daylength, relatively higher temperatures for a given day were associated with later, rather than earlier, mating dates. We interpret this to mean that relatively warmer temperatures expanded the length of the mating season. However, the extent of this seasonal expansion differed by ecoregion. In particular, populations in the CALC ecoregion mated year-round, but mating occurred earlier when winter precipitation was higher and expanded by 5 days or more for each 1°C relative temperature increase. The importance of winter precipitation to O. fasciatus in the CALC region corresponds to recent publications indicating a strong role for precipitation in limiting Asclepias distributions in arid regions, though studies of plants point to summer rather than winter precipitation as a driving factor (Lemoine, 2015).
In contrast to phenological responses in CALC, populations in the GLNC ecoregion, where the annual mean temperature was below 27°C, mated 1 to 2 days earlier per 1°C relative temperature increase, suggesting that shorter daylengths prevent a longer expansion of the mating season in northern populations. Our results support calls for investigations of the role of solar seasonality in the physiological responses of plants and insects to climate change (Spence & Tingley, 2020). Climate is often assumed to be the primary factor contributing to phenological processes, but daylength or photoperiod also influences insect physiological processes.
Oncopletus fasciatus uses photoperiod cues to migrate northward from southern populations in spring and southward from northern populations in fall (Sauer & Feir, 1973). Females enter reproductive diapause in response to shortened daylength prior to migratory flights (Dingle, 1974) and can show phenotypic plasticity related to their ability to reabsorb eggs (Attisano et al., 2013). Early laboratory studies showed that O. fasciatus females reared at 27°C did not enter reproductive diapause even under short photoperiods (Dingle, 1974;Dingle et al., 1980). Should temperatures in the GLNC ecoregion increase above 27°C, southward migration may be disrupted in this species.
Our results suggest that species shifting to more northward ranges to adapt to changing climatic conditions may be constrained by photoperiod (Spence & Tingley, 2020). For instance, temperature alone does not facilitate northward expansion for many plant species (Bjorkman et al., 2017), which may threaten the stability of species interactions, particularly for specialist herbivores like O. fasciatus (but see Renner & Zohner, 2018). For wide-ranging compared with more locally distrib-

| Nymphs
Responses of nymphs to relative changes in temperature differed substantially over the species' range. Rather than advancing, nymphal presence in California populations was approximately 5 days later for each 1°C of relative temperature increase. As with adults, because nymphs are present all year in the CALC ecoregion, we interpret this to indicate greater numbers of nymphs in later months and an expansion of nymphal feeding season in western regions.
These delays were the opposite of nymphal responses in eastern regions, where increases in relative temperature advanced nymphal phenology, highlighting large-scale and high regional variation in climate sensitivity in this species. Nymphal timing was also delayed by increases in summer precipitation in eastern ecoregions, which may relate to decreased solar insolation during periods of cloud cover.
During development, nymphs rely on seed pod production in Asclepias sp. (Miller & Dingle, 1982); milkweed pods may be available continuously in southern climates but are only present seasonally in more northerly regions. Thus, low-latitude regions sustain yearround populations, and northern populations fluctuate and are replenished by migrants from the southern regions (Leslie, 1990). Baldwin and Dingle (1986) found that increased temperature de- with seed pod availability (Baldwin & Dingle, 1986). Nymphal aggregation can contribute to greater feeding efficiency and increase the rate of juvenile development by assisting juveniles in piercing the pod wall and digesting seeds (Ralph, 1976). We observed nymphal aggregations more often on seed pods and those aggregations on seed pods were larger than those on plant leaves or stems ( Figure 2).
Nymph clusters disperse when exploring for a new host plant, later re-aggregating on to feed again in groups (Ralph, 1976). We often observed nymphs on parts of the plant other than seed pods, particularly when leaves and branches were nearby and the cluster was too big to fit on the closed pods available. In fact, our largest nymphal aggregations were those coded as "not on the plant," suggesting that larger groups may move more often in search of suitable seed pods.  (Berenbaum & Miliczky, 1984;Duffey & Scudder, 1972;Sillén-Tullberg et al., 1982). The non-Hemiptera misidentifications were much less common but included Tetratopes sp., which also use milkweed (Asclepias sp.), or beetles with similar color patterns (Calopteron, Chauliongnathus, Liliocerus). This suggests that citizen scientists, rather than being poor identifiers of insect taxa, can be led astray in predictable patterns or by organisms sharing the same color patterns, habitats, or ecosystems. Patterns in misidentification can guide future uses of iNaturalist data and efforts to improve identifications, as they allow prediction of potentially confounding species, and research users are usually aware of known mimics of target species.

| Accuracy of identifications on citizen science observations
However, our study demonstrates some meaningful limitations in the utility of these citizen science observations. First, the quality and angle of the photos precluded assessment of specific nymphal instar stages. This limitation influenced the degree of specificity our models of nymphal development could attain. Moreover, because photographs capture a single moment in time for these individuals instead of monitoring an individual adult, egg, or nymphal aggregation over time, we do not have specific egg deposition or eclosion dates for nymphal clusters. This prevented us from using more fine-grained measures of temperature and its impacts on development, which contributed to our weak correlations between growing degree days and development ( Figure 6) and to our lower-than-anticipated model fits (Table 1). Additionally, we selected O. fasciatus because it is a widely photographed, charismatic species on an actively monitored host plant, which increased our sample sizes to be able to assess phenology and life history questions across the range. However, for other less highly photographed species, the limitations in scale and temporal scope may be more substantial and prevent these analyses.
Finally, known spatial and temporal biases exist in opportunistic citizen science data (Callaghan et al., 2019;Courter et al., 2013;Tang et al., 2021). Citizen scientists tend to collect data in the warmer months of the year and in more densely populated regions and green spaces near major metropolitan areas. By jointly modeling latitude and longitude to account for spatial structure, we are able to account for some of this limitation, but we recommend cautious interpretation in analyses of this type.
Despite these limitations, citizen science observations documented the location and phenology of O. fasciatus and simultaneously documented the mating behavior and the age structure of nymphal groups, as well as interactions with host plants. We have only scratched the surface of opportunities to explore these relationships in citizen science data, even within this species. For example, we did not consider questions related to the sex ratio in observation photos, but many other species may be more readily visually distinguished by sex. Furthermore, we did not differentiate between instar stages in these observations, which would enable finer-grained investigations of developmental progressions and their relationship to climate variability. Finally, while we included information about the host plant through the nymphal group plant part occupancy, photos were often too close-up to determine the host plant species identity or fully document the phenology of the host plant.
Future work investigating relationships between host plant phenology and insect phenology will be enabled by clear documentation of the identity and phenological status of the host plant as well as the arthropod community. Encouraging documentation of these types of species interaction data in citizen science platforms will further enable future work to capitalize on the wealth of information captured by citizen scientists.

ACK N OWLED G M ENTS
We are grateful to the more than 5000 iNaturalist observers who submitted observations and to the more than 1000 identifiers, particularly Michael Pirrello and Fabien Piednoir. We also thank participants in the Earth Science Information Partner's summer meeting (2019) and the Digital Data Conference (2020) and anonymous reviewers for feedback and comments.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare no competing conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The Oncopeltus fasciatus occurrence dataset from iNaturalist is available through iNaturalist and as a darwinCore occurrence dataset through GBIF, and the full dataset, including author-generated

PE R M I SS I O N TO R E PRO D U CE M ATE R I A L S FRO M OTH E R S O U RCE S
None.